Dataset statistics
| Number of variables | 18 |
|---|---|
| Number of observations | 17464 |
| Missing cells | 17472 |
| Missing cells (%) | 5.6% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 2.4 MiB |
| Average record size in memory | 144.0 B |
Variable types
| NUM | 13 |
|---|---|
| CAT | 3 |
| BOOL | 1 |
| UNSUPPORTED | 1 |
Reproduction
| Analysis started | 2020-08-08 22:44:19.563083 |
|---|---|
| Analysis finished | 2020-08-08 22:44:56.971269 |
| Duration | 37.41 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
username has a high cardinality: 10490 distinct values | High cardinality |
stopwords has a high cardinality: 14801 distinct values | High cardinality |
clean_text has a high cardinality: 16317 distinct values | High cardinality |
likes_count is highly correlated with retweets_count | High correlation |
retweets_count is highly correlated with likes_count | High correlation |
positive is highly correlated with Unnamed: 0 and 2 other fields | High correlation |
Unnamed: 0 is highly correlated with positive | High correlation |
date is highly correlated with positive and 1 other fields | High correlation |
death is highly correlated with date and 1 other fields | High correlation |
geo has 17464 (100.0%) missing values | Missing |
clean_text is uniformly distributed | Uniform |
Unnamed: 0 has unique values | Unique |
geo is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
replies_count has 10882 (62.3%) zeros | Zeros |
retweets_count has 11418 (65.4%) zeros | Zeros |
likes_count has 8671 (49.7%) zeros | Zeros |
stopwords_count has 682 (3.9%) zeros | Zeros |
Sentiment has 4437 (25.4%) zeros | Zeros |
| Distinct count | 17464 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8731.5 |
|---|---|
| Minimum | 0 |
| Maximum | 17463 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Memory size | 136.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 873.15 |
| Q1 | 4365.75 |
| median | 8731.5 |
| Q3 | 13097.25 |
| 95-th percentile | 16589.85 |
| Maximum | 17463 |
| Range | 17463 |
| Interquartile range (IQR) | 8731.5 |
Descriptive statistics
| Standard deviation | 5041.566886 |
|---|---|
| Coefficient of variation (CV) | 0.577399861 |
| Kurtosis | -1.2 |
| Mean | 8731.5 |
| Median Absolute Deviation (MAD) | 4366 |
| Skewness | 0 |
| Sum | 152486916 |
| Variance | 25417396.67 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 2047 | 1 | < 0.1% | |
| 17037 | 1 | < 0.1% | |
| 14978 | 1 | < 0.1% | |
| 12931 | 1 | < 0.1% | |
| 2692 | 1 | < 0.1% | |
| 645 | 1 | < 0.1% | |
| 6790 | 1 | < 0.1% | |
| 4743 | 1 | < 0.1% | |
| 10896 | 1 | < 0.1% | |
| 485 | 1 | < 0.1% | |
| Other values (17454) | 17454 | 99.9% |
| Value | Count | Frequency (%) | |
| 0 | 1 | < 0.1% | |
| 1 | 1 | < 0.1% | |
| 2 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 17463 | 1 | < 0.1% | |
| 17462 | 1 | < 0.1% | |
| 17461 | 1 | < 0.1% | |
| 17460 | 1 | < 0.1% | |
| 17459 | 1 | < 0.1% |
| Distinct count | 198 |
|---|---|
| Unique (%) | 1.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.5936496622995878e+18 |
|---|---|
| Minimum | 1579651200000000000 |
| Maximum | 1596672000000000000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 136.4 KiB |
Quantile statistics
| Minimum | 1.5796512e+18 |
|---|---|
| 5-th percentile | 1.5848352e+18 |
| Q1 | 1.5937344e+18 |
| median | 1.5945984e+18 |
| Q3 | 1.5955488e+18 |
| 95-th percentile | 1.5964128e+18 |
| Maximum | 1.596672e+18 |
| Range | 1.70208e+16 |
| Interquartile range (IQR) | 1.8144e+15 |
Descriptive statistics
| Standard deviation | 3.417583284e+15 |
|---|---|
| Coefficient of variation (CV) | 0.002144500994 |
| Kurtosis | 4.833024516 |
| Mean | 1.593649662e+18 |
| Median Absolute Deviation (MAD) | 8.64e+14 |
| Skewness | -2.298792455 |
| Sum | -4.639104828e+18 |
| Variance | 1.167987551e+31 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1.5945984e+18 | 591 | 3.4% | |
| 1.5939936e+18 | 548 | 3.1% | |
| 1.5941664e+18 | 514 | 2.9% | |
| 1.59408e+18 | 486 | 2.8% | |
| 1.594944e+18 | 484 | 2.8% | |
| 1.5948576e+18 | 455 | 2.6% | |
| 1.5943392e+18 | 450 | 2.6% | |
| 1.593648e+18 | 449 | 2.6% | |
| 1.5960672e+18 | 430 | 2.5% | |
| 1.5942528e+18 | 424 | 2.4% | |
| Other values (188) | 12633 | 72.3% |
| Value | Count | Frequency (%) | |
| 1.5796512e+18 | 17 | 0.1% | |
| 1.5797376e+18 | 25 | 0.1% | |
| 1.579824e+18 | 16 | 0.1% | |
| 1.5799104e+18 | 16 | 0.1% | |
| 1.5799968e+18 | 18 | 0.1% |
| Value | Count | Frequency (%) | |
| 1.596672e+18 | 225 | 1.3% | |
| 1.5965856e+18 | 275 | 1.6% | |
| 1.5964992e+18 | 335 | 1.9% | |
| 1.5964128e+18 | 367 | 2.1% | |
| 1.5963264e+18 | 332 | 1.9% |
| Distinct count | 10490 |
|---|---|
| Unique (%) | 60.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 136.4 KiB |
| realdonaldtrump | 2524 |
|---|---|
| washingtonpost | 576 |
| nytimes | 267 |
| freddiesirmans | 160 |
| bornwildm | 116 |
| Other values (10485) |
| Value | Count | Frequency (%) | |
| realdonaldtrump | 2524 | 14.5% | |
| washingtonpost | 576 | 3.3% | |
| nytimes | 267 | 1.5% | |
| freddiesirmans | 160 | 0.9% | |
| bornwildm | 116 | 0.7% | |
| democratboricua | 94 | 0.5% | |
| nygovcuomo | 83 | 0.5% | |
| davidhamer_1951 | 59 | 0.3% | |
| sudiptamalakar4 | 46 | 0.3% | |
| ykhalim | 45 | 0.3% | |
| Other values (10480) | 13494 | 77.3% |
Length
| Max length | 15 |
|---|---|
| Median length | 12 |
| Mean length | 11.8397847 |
| Min length | 2 |
| Distinct count | 2672 |
|---|---|
| Unique (%) | 15.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2756.2297297297296 |
|---|---|
| Minimum | 0 |
| Maximum | 193481 |
| Zeros | 10882 |
| Zeros (%) | 62.3% |
| Memory size | 136.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 2 |
| 95-th percentile | 18507.55 |
| Maximum | 193481 |
| Range | 193481 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 10462.96546 |
|---|---|
| Coefficient of variation (CV) | 3.79611516 |
| Kurtosis | 61.77067185 |
| Mean | 2756.22973 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 6.541899281 |
| Sum | 48134796 |
| Variance | 109473646.3 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 10882 | 62.3% | |
| 1 | 2209 | 12.6% | |
| 2 | 493 | 2.8% | |
| 3 | 174 | 1.0% | |
| 4 | 68 | 0.4% | |
| 5 | 58 | 0.3% | |
| 6 | 35 | 0.2% | |
| 7 | 32 | 0.2% | |
| 13 | 30 | 0.2% | |
| 12 | 30 | 0.2% | |
| Other values (2662) | 3453 | 19.8% |
| Value | Count | Frequency (%) | |
| 0 | 10882 | 62.3% | |
| 1 | 2209 | 12.6% | |
| 2 | 493 | 2.8% | |
| 3 | 174 | 1.0% | |
| 4 | 68 | 0.4% |
| Value | Count | Frequency (%) | |
| 193481 | 1 | < 0.1% | |
| 191133 | 1 | < 0.1% | |
| 187705 | 1 | < 0.1% | |
| 171777 | 1 | < 0.1% | |
| 169276 | 1 | < 0.1% |
| Distinct count | 2891 |
|---|---|
| Unique (%) | 16.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3547.869617498855 |
|---|---|
| Minimum | 0 |
| Maximum | 216656 |
| Zeros | 11418 |
| Zeros (%) | 65.4% |
| Memory size | 136.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 2 |
| 95-th percentile | 25920.85 |
| Maximum | 216656 |
| Range | 216656 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 10626.27552 |
|---|---|
| Coefficient of variation (CV) | 2.995114439 |
| Kurtosis | 29.21936164 |
| Mean | 3547.869617 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.387111108 |
| Sum | 61959995 |
| Variance | 112917731.4 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 11418 | 65.4% | |
| 1 | 1343 | 7.7% | |
| 2 | 430 | 2.5% | |
| 3 | 214 | 1.2% | |
| 4 | 127 | 0.7% | |
| 5 | 75 | 0.4% | |
| 6 | 48 | 0.3% | |
| 7 | 44 | 0.3% | |
| 8 | 31 | 0.2% | |
| 10 | 31 | 0.2% | |
| Other values (2881) | 3703 | 21.2% |
| Value | Count | Frequency (%) | |
| 0 | 11418 | 65.4% | |
| 1 | 1343 | 7.7% | |
| 2 | 430 | 2.5% | |
| 3 | 214 | 1.2% | |
| 4 | 127 | 0.7% |
| Value | Count | Frequency (%) | |
| 216656 | 1 | < 0.1% | |
| 117144 | 1 | < 0.1% | |
| 111753 | 1 | < 0.1% | |
| 110900 | 1 | < 0.1% | |
| 108547 | 1 | < 0.1% |
| Distinct count | 3152 |
|---|---|
| Unique (%) | 18.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 16163.446518552451 |
|---|---|
| Minimum | 0 |
| Maximum | 808801 |
| Zeros | 8671 |
| Zeros (%) | 49.7% |
| Memory size | 136.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 7 |
| 95-th percentile | 115019.55 |
| Maximum | 808801 |
| Range | 808801 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 50353.45501 |
|---|---|
| Coefficient of variation (CV) | 3.115267215 |
| Kurtosis | 30.8175804 |
| Mean | 16163.44652 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 4.712530545 |
| Sum | 282278430 |
| Variance | 2535470432 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 8671 | 49.7% | |
| 1 | 2298 | 13.2% | |
| 2 | 943 | 5.4% | |
| 3 | 468 | 2.7% | |
| 4 | 300 | 1.7% | |
| 5 | 200 | 1.1% | |
| 6 | 148 | 0.8% | |
| 7 | 113 | 0.6% | |
| 8 | 83 | 0.5% | |
| 9 | 80 | 0.5% | |
| Other values (3142) | 4160 | 23.8% |
| Value | Count | Frequency (%) | |
| 0 | 8671 | 49.7% | |
| 1 | 2298 | 13.2% | |
| 2 | 943 | 5.4% | |
| 3 | 468 | 2.7% | |
| 4 | 300 | 1.7% |
| Value | Count | Frequency (%) | |
| 808801 | 1 | < 0.1% | |
| 707804 | 1 | < 0.1% | |
| 620298 | 1 | < 0.1% | |
| 581156 | 1 | < 0.1% | |
| 561044 | 1 | < 0.1% |
video
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 136.4 KiB |
| 0 | |
|---|---|
| 1 | 379 |
| Value | Count | Frequency (%) | |
| 0 | 17085 | 97.8% | |
| 1 | 379 | 2.2% |
| Distinct count | 187 |
|---|---|
| Unique (%) | 1.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3186199.8137883646 |
|---|---|
| Minimum | 2 |
| Maximum | 4852143 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 136.4 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 42169 |
| Q1 | 2786467 |
| median | 3350434 |
| Q3 | 4093266 |
| 95-th percentile | 4694126 |
| Maximum | 4852143 |
| Range | 4852141 |
| Interquartile range (IQR) | 1306799 |
Descriptive statistics
| Standard deviation | 1232355.54 |
|---|---|
| Coefficient of variation (CV) | 0.3867791136 |
| Kurtosis | 0.8071931655 |
| Mean | 3186199.814 |
| Median Absolute Deviation (MAD) | 618190 |
| Skewness | -1.106819792 |
| Sum | 5.564379355e+10 |
| Variance | 1.518700176e+12 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 3350434 | 591 | 3.4% | |
| 2928590 | 548 | 3.1% | |
| 3042503 | 514 | 2.9% | |
| 2980356 | 486 | 2.8% | |
| 3626881 | 484 | 2.8% | |
| 3549648 | 455 | 2.6% | |
| 3167984 | 450 | 2.6% | |
| 2732244 | 449 | 2.6% | |
| 4467852 | 430 | 2.5% | |
| 3101339 | 424 | 2.4% | |
| Other values (177) | 12633 | 72.3% |
| Value | Count | Frequency (%) | |
| 2 | 120 | 0.7% | |
| 3 | 40 | 0.2% | |
| 4 | 8 | < 0.1% | |
| 6 | 8 | < 0.1% | |
| 7 | 11 | 0.1% |
| Value | Count | Frequency (%) | |
| 4852143 | 225 | 1.3% | |
| 4797959 | 275 | 1.6% | |
| 4745694 | 335 | 1.9% | |
| 4694126 | 367 | 2.1% | |
| 4644565 | 332 | 1.9% |
| Distinct count | 162 |
|---|---|
| Unique (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 118527.3047984425 |
|---|---|
| Minimum | 2.0 |
| Maximum | 151483.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 136.4 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 480 |
| Q1 | 122140 |
| median | 127909 |
| Q3 | 137602 |
| 95-th percentile | 147631 |
| Maximum | 151483 |
| Range | 151481 |
| Interquartile range (IQR) | 15462 |
Descriptive statistics
| Standard deviation | 37236.70137 |
|---|---|
| Coefficient of variation (CV) | 0.3141613777 |
| Kurtosis | 4.405960671 |
| Mean | 118527.3048 |
| Median Absolute Deviation (MAD) | 7062 |
| Skewness | -2.3240636 |
| Sum | 2069960851 |
| Variance | 1386571929 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 127909 | 591 | 3.4% | |
| 122898 | 548 | 3.1% | |
| 124628 | 514 | 2.9% | |
| 123821 | 486 | 2.8% | |
| 131401 | 484 | 2.8% | |
| 2 | 479 | 2.7% | |
| 130450 | 455 | 2.6% | |
| 126349 | 450 | 2.6% | |
| 121542 | 449 | 2.6% | |
| 144114 | 430 | 2.5% | |
| Other values (152) | 12578 | 72.0% |
| Value | Count | Frequency (%) | |
| 2 | 479 | 2.7% | |
| 4 | 17 | 0.1% | |
| 5 | 9 | 0.1% | |
| 8 | 20 | 0.1% | |
| 11 | 28 | 0.2% |
| Value | Count | Frequency (%) | |
| 151483 | 225 | 1.3% | |
| 150232 | 275 | 1.6% | |
| 148807 | 335 | 1.9% | |
| 147631 | 367 | 2.1% | |
| 147112 | 332 | 1.9% |
word_count
Real number (ℝ≥0)
| Distinct count | 74 |
|---|---|
| Unique (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 35.87597343105818 |
|---|---|
| Minimum | 1 |
| Maximum | 91 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 136.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 12 |
| Q1 | 27 |
| median | 38 |
| Q3 | 46 |
| 95-th percentile | 52 |
| Maximum | 91 |
| Range | 90 |
| Interquartile range (IQR) | 19 |
Descriptive statistics
| Standard deviation | 12.66368262 |
|---|---|
| Coefficient of variation (CV) | 0.3529850596 |
| Kurtosis | -0.3203013052 |
| Mean | 35.87597343 |
| Median Absolute Deviation (MAD) | 9 |
| Skewness | -0.5747408274 |
| Sum | 626538 |
| Variance | 160.3688575 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 47 | 701 | 4.0% | |
| 45 | 671 | 3.8% | |
| 46 | 655 | 3.8% | |
| 43 | 654 | 3.7% | |
| 42 | 644 | 3.7% | |
| 44 | 594 | 3.4% | |
| 48 | 590 | 3.4% | |
| 41 | 553 | 3.2% | |
| 38 | 540 | 3.1% | |
| 39 | 539 | 3.1% | |
| Other values (64) | 11323 | 64.8% |
| Value | Count | Frequency (%) | |
| 1 | 4 | < 0.1% | |
| 2 | 39 | 0.2% | |
| 3 | 78 | 0.4% | |
| 4 | 109 | 0.6% | |
| 5 | 87 | 0.5% |
| Value | Count | Frequency (%) | |
| 91 | 1 | < 0.1% | |
| 90 | 1 | < 0.1% | |
| 86 | 1 | < 0.1% | |
| 83 | 1 | < 0.1% | |
| 81 | 1 | < 0.1% |
avg_word_length
Real number (ℝ≥0)
| Distinct count | 3807 |
|---|---|
| Unique (%) | 21.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.864222972373408 |
|---|---|
| Minimum | 3.0526315789473686 |
| Maximum | 122.11111111111113 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 136.4 KiB |
Quantile statistics
| Minimum | 3.052631579 |
|---|---|
| 5-th percentile | 4.086956522 |
| Q1 | 4.634146341 |
| median | 5.243902439 |
| Q3 | 6.368421053 |
| 95-th percentile | 9.6 |
| Maximum | 122.1111111 |
| Range | 119.0584795 |
| Interquartile range (IQR) | 1.734274711 |
Descriptive statistics
| Standard deviation | 2.258274043 |
|---|---|
| Coefficient of variation (CV) | 0.3850934819 |
| Kurtosis | 414.1889093 |
| Mean | 5.864222972 |
| Median Absolute Deviation (MAD) | 0.743902439 |
| Skewness | 10.3631277 |
| Sum | 102412.79 |
| Variance | 5.099801653 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 5 | 169 | 1.0% | |
| 6 | 132 | 0.8% | |
| 4.5 | 101 | 0.6% | |
| 4.6 | 88 | 0.5% | |
| 4 | 83 | 0.5% | |
| 5.5 | 81 | 0.5% | |
| 7 | 73 | 0.4% | |
| 4.666666667 | 69 | 0.4% | |
| 4.833333333 | 69 | 0.4% | |
| 4.75 | 67 | 0.4% | |
| Other values (3797) | 16532 | 94.7% |
| Value | Count | Frequency (%) | |
| 3.052631579 | 1 | < 0.1% | |
| 3.146341463 | 1 | < 0.1% | |
| 3.153846154 | 1 | < 0.1% | |
| 3.2 | 1 | < 0.1% | |
| 3.213114754 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 122.1111111 | 1 | < 0.1% | |
| 36 | 1 | < 0.1% | |
| 31.75 | 1 | < 0.1% | |
| 27.42857143 | 1 | < 0.1% | |
| 24.66666667 | 1 | < 0.1% |
| Distinct count | 36 |
|---|---|
| Unique (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12.272159871736143 |
|---|---|
| Minimum | 0 |
| Maximum | 36 |
| Zeros | 682 |
| Zeros (%) | 3.9% |
| Memory size | 136.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 7 |
| median | 13 |
| Q3 | 17 |
| 95-th percentile | 23 |
| Maximum | 36 |
| Range | 36 |
| Interquartile range (IQR) | 10 |
Descriptive statistics
| Standard deviation | 6.596125496 |
|---|---|
| Coefficient of variation (CV) | 0.5374869269 |
| Kurtosis | -0.6330147516 |
| Mean | 12.27215987 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | 0.03804718459 |
| Sum | 214321 |
| Variance | 43.50887156 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 15 | 1002 | 5.7% | |
| 13 | 986 | 5.6% | |
| 14 | 957 | 5.5% | |
| 12 | 942 | 5.4% | |
| 17 | 932 | 5.3% | |
| 16 | 907 | 5.2% | |
| 10 | 862 | 4.9% | |
| 11 | 837 | 4.8% | |
| 18 | 806 | 4.6% | |
| 7 | 776 | 4.4% | |
| Other values (26) | 8457 | 48.4% |
| Value | Count | Frequency (%) | |
| 0 | 682 | 3.9% | |
| 1 | 307 | 1.8% | |
| 2 | 432 | 2.5% | |
| 3 | 519 | 3.0% | |
| 4 | 611 | 3.5% |
| Value | Count | Frequency (%) | |
| 36 | 1 | < 0.1% | |
| 35 | 2 | < 0.1% | |
| 33 | 4 | < 0.1% | |
| 32 | 7 | < 0.1% | |
| 31 | 10 | 0.1% |
char_count
Real number (ℝ≥0)
| Distinct count | 457 |
|---|---|
| Unique (%) | 2.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 233.99559092991296 |
|---|---|
| Minimum | 10 |
| Maximum | 1110 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 136.4 KiB |
Quantile statistics
| Minimum | 10 |
|---|---|
| 5-th percentile | 93 |
| Q1 | 186 |
| median | 255 |
| Q3 | 279 |
| 95-th percentile | 339 |
| Maximum | 1110 |
| Range | 1100 |
| Interquartile range (IQR) | 93 |
Descriptive statistics
| Standard deviation | 75.58938762 |
|---|---|
| Coefficient of variation (CV) | 0.3230376578 |
| Kurtosis | 1.16897423 |
| Mean | 233.9955909 |
| Median Absolute Deviation (MAD) | 40 |
| Skewness | -0.3282087694 |
| Sum | 4086499 |
| Variance | 5713.755521 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 280 | 756 | 4.3% | |
| 279 | 667 | 3.8% | |
| 278 | 476 | 2.7% | |
| 277 | 360 | 2.1% | |
| 276 | 311 | 1.8% | |
| 275 | 254 | 1.5% | |
| 274 | 235 | 1.3% | |
| 273 | 188 | 1.1% | |
| 271 | 182 | 1.0% | |
| 272 | 181 | 1.0% | |
| Other values (447) | 13854 | 79.3% |
| Value | Count | Frequency (%) | |
| 10 | 5 | < 0.1% | |
| 11 | 1 | < 0.1% | |
| 12 | 12 | 0.1% | |
| 13 | 3 | < 0.1% | |
| 16 | 3 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1110 | 1 | < 0.1% | |
| 602 | 1 | < 0.1% | |
| 601 | 1 | < 0.1% | |
| 595 | 1 | < 0.1% | |
| 594 | 1 | < 0.1% |
| Distinct count | 14801 |
|---|---|
| Unique (%) | 84.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 136.4 KiB |
| [] | 682 |
|---|---|
| ['and'] | 112 |
| ['of', 'of', 'and', 'of', 'and', 'in', 'below'] | 51 |
| ['in', 'for', 'and', 'on', 'on'] | 47 |
| ['you'] | 43 |
| Other values (14796) |
| Value | Count | Frequency (%) | |
| [] | 682 | 3.9% | |
| ['and'] | 112 | 0.6% | |
| ['of', 'of', 'and', 'of', 'and', 'in', 'below'] | 51 | 0.3% | |
| ['in', 'for', 'and', 'on', 'on'] | 47 | 0.3% | |
| ['you'] | 43 | 0.2% | |
| ['to', 'and', 'for'] | 34 | 0.2% | |
| ['for', 'to', 'on', 'all', 'the', 'by', 'to', 'for', 'and', 'in', 'to', 'and'] | 34 | 0.2% | |
| ['into', 'than', 'and', 'is', 'more', 'the', 'of', 'the'] | 32 | 0.2% | |
| ['of', 'at', 'is', 'than', 'and', 'is', 'at'] | 31 | 0.2% | |
| ['of', 'so', 'is', 'than', 'and', 'is', 'at'] | 28 | 0.2% | |
| Other values (14791) | 16370 | 93.7% |
Length
| Max length | 242 |
|---|---|
| Median length | 86 |
| Mean length | 84.72371736 |
| Min length | 2 |
| Distinct count | 16317 |
|---|---|
| Unique (%) | 93.5% |
| Missing | 8 |
| Missing (%) | < 0.1% |
| Memory size | 136.4 KiB |
| huge discount first time ever amazon history covid independence happy learning ebook amazon ebook amazon india | 47 |
|---|---|
| julycoronavirus covid status total increase confirmed cases death test hospitalisation worldwide state newyorkcity please detail supporting reports facebook link | 39 |
| time default bonds held china spreading covid_ covid causing trillions damage life global economy realdonaldtrump potus secpompeo trump uschina southchinasea taiwan hongkong vietnam huawei boycottchinese | 32 |
| corner covid experience deaths million population lower belgium spain italy sweden france netherlands ireland | 28 |
| heads lets minds together think nothing democrats whyre good like know help covid nasty president sorry want yall safe | 28 |
| Other values (16312) |
| Value | Count | Frequency (%) | |
| huge discount first time ever amazon history covid independence happy learning ebook amazon ebook amazon india | 47 | 0.3% | |
| julycoronavirus covid status total increase confirmed cases death test hospitalisation worldwide state newyorkcity please detail supporting reports facebook link | 39 | 0.2% | |
| time default bonds held china spreading covid_ covid causing trillions damage life global economy realdonaldtrump potus secpompeo trump uschina southchinasea taiwan hongkong vietnam huawei boycottchinese | 32 | 0.2% | |
| corner covid experience deaths million population lower belgium spain italy sweden france netherlands ireland | 28 | 0.2% | |
| heads lets minds together think nothing democrats whyre good like know help covid nasty president sorry want yall safe | 28 | 0.2% | |
| christ since take back earth plagues pestilence assaults like covid used destroy world empire create gods kingdom earth daniel | 24 | 0.1% | |
| something think according website covid france death rate virus death rate less wont hear fauci fake news media | 21 | 0.1% | |
| covid measures finish world empire told showed stupid right eyes mirror events show narration tells nature remedy daniel | 20 | 0.1% | |
| terms covid cases canadas experience million lower sweden spain iceland belgium ireland portugal italy switzerland netherlands | 19 | 0.1% | |
| thank | 19 | 0.1% | |
| Other values (16307) | 17179 | 98.4% |
Length
| Max length | 251 |
|---|---|
| Median length | 137 |
| Mean length | 129.1997251 |
| Min length | 3 |
| Distinct count | 2402 |
|---|---|
| Unique (%) | 13.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.05259429118950407 |
|---|---|
| Minimum | -1.0 |
| Maximum | 1.0 |
| Zeros | 4437 |
| Zeros (%) | 25.4% |
| Memory size | 136.4 KiB |
Quantile statistics
| Minimum | -1 |
|---|---|
| 5-th percentile | -0.4 |
| Q1 | -0.05421626984 |
| median | 0 |
| Q3 | 0.2 |
| 95-th percentile | 0.5 |
| Maximum | 1 |
| Range | 2 |
| Interquartile range (IQR) | 0.2542162698 |
Descriptive statistics
| Standard deviation | 0.2758412401 |
|---|---|
| Coefficient of variation (CV) | 5.24469926 |
| Kurtosis | 1.815630939 |
| Mean | 0.05259429119 |
| Median Absolute Deviation (MAD) | 0.1333333333 |
| Skewness | 0.05192925216 |
| Sum | 918.5067013 |
| Variance | 0.07608838973 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 4437 | 25.4% | |
| 0.5 | 422 | 2.4% | |
| 0.2 | 417 | 2.4% | |
| 0.25 | 381 | 2.2% | |
| -0.2 | 307 | 1.8% | |
| 0.1 | 296 | 1.7% | |
| -0.1 | 279 | 1.6% | |
| 0.4 | 257 | 1.5% | |
| -0.5 | 241 | 1.4% | |
| 0.8 | 223 | 1.3% | |
| Other values (2392) | 10204 | 58.4% |
| Value | Count | Frequency (%) | |
| -1 | 76 | 0.4% | |
| -0.9 | 6 | < 0.1% | |
| -0.9 | 1 | < 0.1% | |
| -0.875 | 2 | < 0.1% | |
| -0.8666666667 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1 | 71 | 0.4% | |
| 0.9333333333 | 1 | < 0.1% | |
| 0.925 | 1 | < 0.1% | |
| 0.9 | 20 | 0.1% | |
| 0.875 | 1 | < 0.1% |
Target
Real number (ℝ)
| Distinct count | 18 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7.494457169033441 |
|---|---|
| Minimum | -4.0 |
| Maximum | 15.0 |
| Zeros | 103 |
| Zeros (%) | 0.6% |
| Memory size | 136.4 KiB |
Quantile statistics
| Minimum | -4 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 4 |
| median | 8 |
| Q3 | 9 |
| 95-th percentile | 15 |
| Maximum | 15 |
| Range | 19 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 3.222535636 |
|---|---|
| Coefficient of variation (CV) | 0.4299891991 |
| Kurtosis | 0.463248983 |
| Mean | 7.494457169 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.1297645351 |
| Sum | 130883.2 |
| Variance | 10.38473592 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 9 | 3982 | 22.8% | |
| 4 | 2604 | 14.9% | |
| 7 | 1962 | 11.2% | |
| 3 | 1719 | 9.8% | |
| 10 | 1495 | 8.6% | |
| 8.7 | 1210 | 6.9% | |
| 8 | 1185 | 6.8% | |
| 15 | 1013 | 5.8% | |
| 6 | 857 | 4.9% | |
| 11 | 434 | 2.5% | |
| Other values (8) | 1003 | 5.7% |
| Value | Count | Frequency (%) | |
| -4 | 71 | 0.4% | |
| 0 | 103 | 0.6% | |
| 1 | 73 | 0.4% | |
| 2 | 84 | 0.5% | |
| 3 | 1719 | 9.8% |
| Value | Count | Frequency (%) | |
| 15 | 1013 | 5.8% | |
| 14 | 142 | 0.8% | |
| 12 | 85 | 0.5% | |
| 11 | 434 | 2.5% | |
| 10 | 1495 | 8.6% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| Unnamed: 0 | date | username | replies_count | retweets_count | likes_count | video | geo | positive | death | word_count | avg_word_length | stopwords_count | char_count | stopwords | clean_text | Sentiment | Target | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 1579651200000000000 | realdonaldtrump | 9465 | 17624 | 88225 | 0 | NaN | 2 | 2.0 | 22 | 5.454545 | 7 | 141 | ['in', 'of', 'will', 'be', 'or', 'to', 'the'] | making great progress davos tremendous numbers companies coming returning hottest economy jobs jobs jobs | 0.566667 | 4.0 |
| 1 | 1 | 1579651200000000000 | realdonaldtrump | 8643 | 24619 | 98960 | 0 | NaN | 2 | 2.0 | 11 | 8.166667 | 5 | 109 | ['if', 'you', 'you', 'will', 'be'] | sorry come immediately sent back | -0.250000 | 4.0 |
| 2 | 2 | 1579651200000000000 | realdonaldtrump | 7035 | 24342 | 97513 | 0 | NaN | 2 | 2.0 | 6 | 12.571429 | 2 | 94 | ['you', 'on'] | fridaybig crowd | 0.000000 | 4.0 |
| 3 | 3 | 1579651200000000000 | realdonaldtrump | 3436 | 12031 | 50605 | 0 | NaN | 2 | 2.0 | 2 | 20.333333 | 0 | 63 | [] | true | 0.350000 | 4.0 |
| 4 | 4 | 1579651200000000000 | realdonaldtrump | 18086 | 19899 | 122408 | 0 | NaN | 2 | 2.0 | 2 | 6.000000 | 0 | 13 | [] | pressure | 0.000000 | 4.0 |
| 5 | 5 | 1579651200000000000 | realdonaldtrump | 2228 | 8103 | 39527 | 0 | NaN | 2 | 2.0 | 4 | 14.000000 | 1 | 74 | ['be'] | great | 0.800000 | 4.0 |
| 6 | 6 | 1579651200000000000 | realdonaldtrump | 1777 | 7588 | 36498 | 0 | NaN | 2 | 2.0 | 6 | 12.428571 | 2 | 93 | ['with', 'you'] | great working maria | 0.800000 | 4.0 |
| 7 | 7 | 1579651200000000000 | realdonaldtrump | 8460 | 19473 | 102575 | 0 | NaN | 2 | 2.0 | 48 | 4.229167 | 21 | 250 | ['of', 'the', 'about', 'our', 'just', 'with', 'is', 'that', 'it', 'will', 'both', 'the', 'in', 'so', 'other', 'with', 'a', 'who', 'his', 'more', 'to'] | many great things signed giant trade deal china bring china closer together many ways terrific working president truly loves country much come | 0.333333 | 4.0 |
| 8 | 8 | 1579651200000000000 | realdonaldtrump | 2116 | 4824 | 23572 | 0 | NaN | 2 | 2.0 | 20 | 5.100000 | 7 | 121 | ['be', 'at', 'by', 'on', 'at', 'the', 'in'] | interviewed eastern joesquawk cnbc world economic forum davos switzerland enjoy | 0.300000 | 4.0 |
| 9 | 9 | 1579651200000000000 | realdonaldtrump | 10559 | 21869 | 89693 | 0 | NaN | 2 | 2.0 | 31 | 4.354839 | 14 | 165 | ['the', 'to', 'up', 'the', 'in', 'the', 'by', 'the', 'that', 'he', 'to', 'and', 'did', 'the'] | senates mess made house democrats biden admitted went ukraine quid stevescalise foxnews | -0.175000 | 4.0 |
Last rows
| Unnamed: 0 | date | username | replies_count | retweets_count | likes_count | video | geo | positive | death | word_count | avg_word_length | stopwords_count | char_count | stopwords | clean_text | Sentiment | Target | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 17454 | 17454 | 1596672000000000000 | mat945 | 0 | 0 | 0 | 0 | NaN | 4852143 | 151483.0 | 44 | 5.295455 | 14 | 276 | ['of', 'and', 'to', 'them', 'in', 'the', 'and', 'are', 'or', 'to', 'as', 'in', 'the', 'against'] | times challenge people become vulnerable seek strong proactive cohesive bipartisan government lead back recent events suggest federal state governments either unwilling unable stand together fight covid | -0.113333 | 3.0 |
| 17455 | 17455 | 1596672000000000000 | allangpaterson | 0 | 0 | 1 | 0 | NaN | 4852143 | 151483.0 | 18 | 7.111111 | 6 | 147 | ['will', 'what', 'a', 'and', 'for', 'the'] | million covid cases today dark demeaning statistic president | -0.150000 | 3.0 |
| 17456 | 17456 | 1596672000000000000 | carolynguzzi | 0 | 0 | 0 | 0 | NaN | 4852143 | 151483.0 | 60 | 4.683333 | 31 | 341 | ['in', 'is', 'all', 'a', 'and', 'after', 'the', 'all', 'over', 'with', 'the', 'will', 'not', 'be', 'or', 'it', 'will', 'be', 'as', 'a', 'we', 'have', 'a', 'for', 'and', 'can', 'this', 'in', 'and', 'it', 'is'] | like opposite high school political ploy election covid either existing described sara cure covid called hydroquoroquin safe | 0.165000 | 3.0 |
| 17457 | 17457 | 1596672000000000000 | pppjain | 0 | 3 | 2 | 0 | NaN | 4852143 | 151483.0 | 45 | 5.600000 | 17 | 297 | ['be', 'for', 'in', 'and', 'in', 'did', 'not', 'in', 'as', 'he', 'is', 'not', 'will', 'have', 'to', 'we', 'will'] | modiji honoured covid memorial prize highest covid cases india helping countries fighting corona fight india selfish wait soon cross covid__ | -0.250000 | 3.0 |
| 17458 | 17458 | 1596672000000000000 | carman17838926 | 0 | 0 | 0 | 0 | NaN | 4852143 | 151483.0 | 36 | 5.000000 | 3 | 216 | ['of', 'a', 'by'] | thanks trump wealthy senators abandoned poor folks especially including millionaire senators cassidy kennedy rare case state ravaged twice covid today | 0.120000 | 3.0 |
| 17459 | 17459 | 1596672000000000000 | renterialawfirm | 0 | 0 | 0 | 0 | NaN | 4852143 | 151483.0 | 22 | 9.260870 | 6 | 236 | ['in', 'for', 'a', 'and', 'who', 'for'] | today rocky ride exwho doctor helped eradicate smallpox predicts covid turmoil years | 0.000000 | 3.0 |
| 17460 | 17460 | 1596672000000000000 | squarerootal2 | 2 | 0 | 3 | 0 | NaN | 4852143 | 151483.0 | 40 | 5.175000 | 13 | 246 | ['where', 'and', 'are', 'where', 'it', 'has', 'been', 'and', 'are', 'has', 'no', 'and', 'his'] | places covid thrives people dying places flattened deaths europe scandinavia china australia president national plan comrades like jim_jordan think thats | 0.000000 | 3.0 |
| 17461 | 17461 | 1596672000000000000 | miasrule | 1 | 0 | 0 | 0 | NaN | 4852143 | 151483.0 | 31 | 4.516129 | 12 | 170 | ['be', 'the', 'out', 'of', 'not', 'just', 'our', 'we', 'are', 'at', 'is', 'as'] | agreed lets clear taking current covid disaster fellow humans good killing violence american apple | 0.200000 | 3.0 |
| 17462 | 17462 | 1596672000000000000 | lindalouwhoh | 0 | 0 | 0 | 0 | NaN | 4852143 | 151483.0 | 42 | 5.571429 | 11 | 278 | ['it', 'to', 'not', 'but', 'and', 'or', 'those', 'with', 'and', 'to', 'the'] | comes covid resembles wealthy powerful countries instead poorer countries like brazil peru south africa large migrant populations like bahrain oman unique failure control virus | 0.214524 | 3.0 |
| 17463 | 17463 | 1596672000000000000 | acai_w | 0 | 0 | 0 | 0 | NaN | 4852143 | 151483.0 | 55 | 4.781818 | 27 | 318 | ['such', 'a', 'and', 'and', 'to', 'you', 'with', 'a', 'of', 'an', 'of', 'the', 'in', 'the', 'and', 'how', 'the', 'is', 'an', 'on', 'to', 'whom', 'the', 'are', 'being', 'out', 'and'] | bold honest statement brought integrity please give update covid spread country handling give update bailouts paid | 0.466667 | 3.0 |